I have been using Workrave to try to force me to step away from the computer regularly to work around Repetitive Strain Injury (RSI) issues that have plagued my life on the computer intermittently in the last decade. Workrave itself is only marginally efficient at getting me away from the machine: as any warning systems, it suffers from alarm fatigue as you frenetically click the dismiss button every time a Workrave warning pops up. However, it has other uses.

Analyzing data input In the past, I have used Workrave to document how I work too much on the computer, but never went through more serious processing of the vast data store that Workrave accumulates about mouse movements and keystrokes. Interested in knowing how much my leave from Koumbit affected time spent on the computer, I decided to look into this again. It turns out I am working as much, if not more, on the computer since I took that "time off": We can see here that I type a lot on the computer. Normal days range from 10 000 to 60 000 keystrokes, with extremes at around 100 000 keystrokes per day. The average seem to fluctuate around 30 to 40 000 keystrokes per day, but rises sharply around the end of the second quarter of this very year. For those unfamiliar with the underlying technology, one keystroke is roughly one byte I put on the computer. So the average of 40 000 keystrokes is 40 kilobyte (KB) per day on the computer. That means 15 MB over a year or about 150MB (or 100 MiB if you want to be picky about it) over the course of the last decade. That is a lot of typing. I originally thought this could have been only because I type more now, as opposed to use more the mouse previously. Unfortunately, Workrave also tracks general "active time" which we can also examine: Here we see that I work around 4 hours a day continuously on the computer. That is active time: not just login, logout time. In other words, the time where i look away from the computer and think for a while, jot down notes in my paper agenda or otherwise step away from the computer for small breaks is not counted here. Notice how some days go up to 12 hours and how recently the average went up to 7 hours of continuous activity. So we can clearly see that I basically work more on the computer now than I ever did in the last 7 years. This is a problem - one of the reasons of this time off was to step away from the computer, and it seems I have failed.
Update: it turns out the graph was skewed towards the last samples. I went more easy on the keyboard in the last few days and things have significantly improved:
Another interesting thing we can see is when I switched from using my laptop to using the server as my main workstation, around early 2011, which is about the time `marcos` was built. Now that `marcos` has been turned into a home cinema downstairs, I went back to using my laptop as my main computing device, in late 2015. We can also clearly see when I stopped using Koumbit machines near the end of 2015 as well.

Further improvements and struggle for meaning The details of how the graph was produced are explained at the end of this article. This is all quite clunky: it doesn't help that the Workrave data structure is not easily parsable and so easily corruptible. It would be best if each data point was on its own separate line, which would be long, granted, but so easier to parse. Furthermore, I do not like the perl/awk/gnuplot data processing pipeline much. It doesn't allow me to do interesting analysis like averages, means and linear regressions easily. It could be interesting to rewrite the tools in Python to allow better graphs and easier data analysis, using the tools I learned in 2015-09-28-fun-with-batteries. Finally, this looks only at keystrokes and non-idle activity. It could be more interesting to look at idle/active times and generally the amount of time spent on the computer each day. And while it is interesting to know that I basically write a small book on the computer every day (according to Wikipedia, 120KB is about the size of a small pocket book), it is mostly meaningless if all that stuff is machine-readable code. Where is, after all, the meaning in all those shell commands and programs we constantly input on our keyboards, in the grand scheme of human existence? Most of those bytes are bound to be destroyed by garbage collection (my shell's history) or catastrophic backup failures.
While the creative works from the 16th century can still be accessed and used by others, the data in some software programs from the 1990s is already inaccessible. - Lawrence Lessig
But is my shell history relevant? Looking back at old posts on this blog, one has to wonder if the battery life of the Thinkpad 380z laptop or how much e-waste I threw away in 2005 will be of any historical value in 20 years, assuming the data survives that long.

How this graph was made I was happy to find that Workrave has some contrib scripts to do such processing. Unfortunately, those scripts are not shipped with the Debian package, so I requested that to be fixed (#825982). There were also some fixes necessary to make the script work at all: first, there was a syntax error in the Perl script. But then since my data is so old, there was bound to be some data corruption in there: incomplete entries or just plain broken data. I had lines that were all NULL characters, typical of power failures or disk corruptions. So I have made a patch to fix that script (#826021). But this wasn't enough: while this processes data on the current machine fine, it doesn't deal with multiple machines very well. In the last 7 years of data I could find, I was using 3 different machines: this server (marcos), my laptop (angela) and Koumbit's office servers (koumbit). I ended up modifying the contrib scripts to be able to collate that data meaningfully. First, I copied over the data from Koumbit in a local fake-koumbit directory. Second, I mounted marcos home directory locally with SSHFS:

sshfs anarc.at:/home/anarcat marcos

I also made this script to sum up datasets:

#!/usr/bin/perl -w
use List::MoreUtils 'pairwise';
$  = 1;
my %data = ();
while (<>)  
    my @fields = split;
    my $date = shift @fields;
    if (defined($data $date ))  
        my @t = pairwise   $a + $b   @ $data $date , @fields;
        $data $date  = \@t;
     
    else  
        $data $date  = \@fields;
     
 
foreach my $d ( sort keys %data )  
    print "$d @ $data $d \n";

Then I made a modified version of the Gnuplot script that processes all those files together:

#!/usr/bin/gnuplot
set title "Workrave"
set ylabel "Keystrokes per day"
set timefmt "%Y%m%d%H%M"
#set xrange [450000000:*]
set format x "%Y-%m-%d"
set xtics rotate
set xdata time
set terminal svg
set output "workrave.svg"
plot "workrave-angela.dat" using 1:28 title "angela", \
     "workrave-marcos.dat" using 1:28 title "marcos", \
     "workrave-koumbit.dat" using 1:28 title "koumbit", \
     "workrave-sum.dat" using 1:2 smooth sbezier linewidth 3 title "average"
#plot "workrave-angela.dat" using 1:28 smooth sbezier title "angela", \
#     "workrave-marcos.dat" using 1:28 smooth sbezier title "marcos", \
#     "workrave-koumbit.dat" using 1:28 smooth sbezier title "koumbit"

And finally, I made a small shell script to glue this all together:

#!/bin/sh
perl workrave-dump > workrave-$(hostname).dat
HOME=$HOME/marcos perl workrave-dump > workrave-marcos.dat
HOME=$PWD/fake-koumbit perl workrave-dump > workrave-koumbit.dat
# remove idle days as they skew the average
sed -i '/ 0$/d' workrave-*.dat
# per-day granularity
sed -i 's/^\(........\)....\? /\1 /' workrave-*.dat
# sum up all graphs
cat workrave-*.dat   sort   perl sum.pl > workrave.dat
./gnuplot-workrave-anarcat

I used a different gnuplot script to generate the activity graph:

#!/usr/bin/gnuplot
set title "Workrave"
set ylabel "Active hours per day"
set timefmt "%Y%m%d%H%M"
#set xrange [450000000:*]
set format x "%Y-%m-%d"
set xtics rotate
set xdata time
set terminal svg
set output "workrave.svg"
plot "workrave-angela.dat"  using 1:($23/3600) title "angela", \
     "workrave-marcos.dat"  using 1:($23/3600) title "marcos", \
     "workrave-koumbit.dat" using 1:($23/3600) title "koumbit", \
     "workrave.dat"         using 1:($23/3600) title "average" smooth sbezier linewidth 3
#plot "workrave-angela.dat" using 1:28 smooth sbezier title "angela", \
#     "workrave-marcos.dat" using 1:28 smooth sbezier title "marcos", \
#     "workrave-koumbit.dat" using 1:28 smooth sbezier title "koumbit"

For the last few months, it seems that my posts haven't been reaching the Planet Debian aggregator correctly. I timed the last two posts and they both arrived roughly 10 days late in the feed.

SNI issues At first, I suspected I was a victim of the SNI bug in Planet Venus: since it is still running in Python 2.7 and uses httplib2 (as opposed to, say, Requests), it has trouble with sites running under SNI. In January, there were 9 blogs with that problem on Planet. When this was discussed elsewhere in February, there were now 18, and then 21 reported in March. With everyone enabling (like me) Let's Encrypt on their website, this number is bound to grow. I was able to reproduce the Debian Planet setup locally to do further tests and ended up sending two (unrelated) patches to the Debian bug tracker against Planet Venus, the software running Debian planet. In my local tests, I found 22 hosts with SNI problems. I also posted some pointers on how the code could be ported over to the more modern Requests and Cachecontrol modules.

Expiry issues However, some of those feeds were working fine on philp, the host I found was running as the Planet Master. Even more strange, my own website was working fine!

INFO:planet.runner:Feed https://anarc.at/tag/debian-planet/index.rss unchanged

Now that was strange: why was my feed fetched, but noted as unchanged? For that, I found that there was a FAQ question buried down in the PlanetDebian wikipage which explicitly said that Planet obeys Expires headers diligently and will not get new content again if the headers say they did. Skeptical, I looked my own headers and, ta-da! they were way off:

$ curl -v https://anarc.at/tag/debian-planet/index.rss 2>&1   egrep  '< (Expires Date)'
< Date: Sat, 14 May 2016 19:59:28 GMT
< Expires: Sat, 28 May 2016 19:59:28 GMT

So I lowered the expires timeout on my RSS feeds to 3 hours:

root@marcos:/etc/apache2# git diff
diff --git a/apache2/conf-available/expires.conf b/apache2/conf-available/expires.conf
index 214f3dd..a983738 100644
--- a/apache2/conf-available/expires.conf
+++ b/apache2/conf-available/expires.conf
@@ -3,8 +3,18 @@
   # Enable expirations.
   ExpiresActive On
-  # Cache all files for 2 weeks after access (A).
-  ExpiresDefault A1209600
+  # Cache all files 12 hours after access
+  ExpiresDefault "access plus 12 hours"
+
+  # RSS feeds should refresh more often
+  <FilesMatch \.(rss)$>
+    ExpiresDefault "modification plus 4 hours"
+  </FilesMatch> 
+
+  # images are *less* likely to change
+  <FilesMatch "\.(gif jpg png js css)$">
+    ExpiresDefault "access plus 1 month"
+  </FilesMatch>
   <FilesMatch \.(php cgi)$>
     # Do not allow scripts to be cached unless they explicitly send cache

I also lowered the general cache expiry, except for images, Javascript and CSS.

Planet Venus maintenance A small last word about all this: I'm surprised to see that Planet Debian is running a 6 year old software that hasn't seen a single official release yet, with local patches on top. It seems that Venus is well designed, I must give them that, but it's a little worrisome to see great software just rotting around like this. A good "planet" site seems like a resource a lot of FLOSS communities would need: is there another "Planet-like" aggregator out there that is well maintained and more reliable? In Python, preferably. PlanetPlanet, which Venus was forked from, is out of the question: it is even less maintained than the new fork, which itself seems to have died in 2011. There is a discussion about the state of Venus on Github which reflects some of the concerns expressed here, as well as on the mailing list. The general consensus seems to be that everyone should switch over to Planet Pluto, which is written in Ruby. I am not sure which planet Debian sits on - Pluto? Venus? Besides, Pluto is not even a planet anymore...

Mike check! So this is also a test to see if my posts reach Debian Planet correctly. I suspect no one will ever see this on the top of their feeds, since the posts do get there, but with a 10 days delay and with the original date, so they are "sunk" down. The above expiration fixes won't take effect until the 10 days delay is over... But if you did see this as noise, retroactive apologies in advance for the trouble. If you are reading this from somewhere else and wish to say hi, don't hesitate, it's always nice to hear from my readers.

I've been using Notmuch since about 2011, switching away from Mutt to deal with the monstrous amount of emails I was, and still am dealing with on the computer. I have contributed a few patches and configs on the Notmuch mailing list, but basically, I have given up on merging patches, and instead have a custom config in Emacs that extend it the way I want. In the last 5 years, Notmuch has progressed significantly, so I haven't found the need to patch it or make sweeping changes.

The huge INBOX of death The one thing that is problematic with my use of Notmuch is that I end up with a ridiculously large INBOX folder. Before the cleanup I did this morning, I had over 10k emails in there, out of about 200k emails overall. Since I mostly work from my laptop these days, the Notmuch tags are only on the laptop, and not propagated to the server. This makes accessing the mail spool directly, from webmail or simply through a local client (say Mutt) on the server, really inconvenient, because it has to load a very large spool of mail, which is very slow in Mutt. Even worse, a bunch of mail that was archived in Notmuch shows up in the spool because it's just removed tags in Notmuch: the mails are still in the inbox, even though they are marked as read. So I was hoping that Notmuch would help me deal with the giant inbox of death problem, but in fact, when I don't use Notmuch, it actually makes the problem worse. Today, I did a bunch of improvements to my setup to fix that. The first thing I did was to kill procmail, which I was surprised to discover has been dead for over a decade. I switched over to Sieve for filtering, having already switched to Dovecot a while back on the server. I tried to use the procmail2sieve.pl conversion tool but it didn't work very well, so I basically rewrote the whole file. Since I was mostly using Notmuch for filtering, there wasn't much left to convert.

Sieve filtering But this is where things got interesting: Sieve is so simpler to use and more intuitive that I started doing more interesting stuff in bridging the filtering system (Sieve) with the tagging system (Notmuch). Basically, I use Sieve to split large chunks of emails off my main inbox, to try to remove as much spam, bulk email, notifications and mailing lists as possible from the larger flow of emails. Then Notmuch comes in and does some fine-tuning, assigning tags to specific mailing lists or topics, and being generally the awesome search engine that I use on a daily basis.

Dovecot and Postfix configs For all of this to work, I had to tweak my mail servers to talk sieve. First, I enabled sieve in Dovecot:

--- a/dovecot/conf.d/15-lda.conf
+++ b/dovecot/conf.d/15-lda.conf
@@ -44,5 +44,5 @@
 protocol lda  
   # Space separated list of plugins to load (default is global mail_plugins).
-  #mail_plugins = $mail_plugins
+  mail_plugins = $mail_plugins sieve

Then I had to switch from procmail to dovecot for local delivery, that was easy, in Postfix's perennial main.cf:

#mailbox_command = /usr/bin/procmail -a "$EXTENSION"
mailbox_command = /usr/lib/dovecot/dovecot-lda -a "$RECIPIENT"

Note that dovecot takes the full recipient as an argument, not just the extension. That's normal. It's clever, it knows that kind of stuff. One last tweak I did was to enable automatic mailbox creation and subscription, so that the automatic extension filtering (below) can create mailboxes on the fly:

--- a/dovecot/conf.d/15-lda.conf
+++ b/dovecot/conf.d/15-lda.conf
@@ -37,10 +37,10 @@
 #lda_original_recipient_header =
 # Should saving a mail to a nonexistent mailbox automatically create it?
-#lda_mailbox_autocreate = no
+lda_mailbox_autocreate = yes
 # Should automatically created mailboxes be also automatically subscribed?
-#lda_mailbox_autosubscribe = no
+lda_mailbox_autosubscribe = yes
 protocol lda  
   # Space separated list of plugins to load (default is global mail_plugins).

Sieve rules Then I had to create a Sieve ruleset. That thing lives in ~/.dovecot.sieve, since I'm running Dovecot. Your provider may accept an arbitrary ruleset like this, or you may need to go through a web interface, or who knows. I'm assuming you're running Dovecot and have a shell from now on. The first part of the file is simply to enable a bunch of extensions, as needed:

# Sieve Filters
# http://wiki.dovecot.org/Pigeonhole/Sieve/Examples
# https://tools.ietf.org/html/rfc5228
require "fileinto";
require "envelope";
require "variables";
require "subaddress";
require "regex";
require "vacation";
require "vnd.dovecot.debug";

Some of those are not used yet, for example I haven't tested the vacation module yet, but I have good hopes that I can use it as a way to announce a special "urgent" mailbox while I'm traveling. The rationale is to have a distinct mailbox for urgent messages that is announced in the autoreply, that hopefully won't be parsable by bots.

Spam filtering Then I filter spam using this fairly standard expression:

########################################################################
# spam 
# possible improvement, server-side:
# http://wiki.dovecot.org/Pigeonhole/Sieve/Examples#Filtering_using_the_spamtest_and_virustest_extensions
if header :contains "X-Spam-Flag" "YES"  
  fileinto "junk";
  stop;
  elsif header :contains "X-Spam-Level" "***"  
  fileinto "greyspam";
  stop;

This puts stuff into the junk or greyspam folder, based on the severity. I am very aggressive with spam: stuff often ends up in the greyspam folder, which I need to check from time to time, but it beats having too much spam in my inbox.

Mailing lists Mailing lists are generally put into a lists folder, with some mailing lists getting their own folder:

########################################################################
# lists
# converted from procmail
if header :contains "subject" "FreshPorts"  
    fileinto "freshports";
  elsif header :contains "List-Id" "alternc.org"  
    fileinto "alternc";
  elsif header :contains "List-Id" "koumbit.org"  
    fileinto "koumbit";
  elsif header :contains ["to", "cc"] ["lists.debian.org",
                                       "anarcat@debian.org"]  
    fileinto "debian";
# Debian BTS
  elsif exists "X-Debian-PR-Message"  
    fileinto "debian";
# default lists fallback
  elsif exists "List-Id"  
    fileinto "lists";

The idea here is that I can safely subscribe to lists without polluting my mailbox by default. Further processing is done in Notmuch.

Extension matching I also use the magic +extension tag on emails. If you send email to, say, foo+extension@example.com then the emails end up in the foo folder. This is done with the help of the following recipe:

########################################################################
# wildcard +extension
# http://wiki.dovecot.org/Pigeonhole/Sieve/Examples#Plus_Addressed_mail_filtering
if envelope :matches :detail "to" "*"  
  # Save name in $ name  in all lowercase except for the first letter.
  # Joe, joe, jOe thus all become 'Joe'.
  set :lower "name" "$ 1 ";
  fileinto "$ name ";
  #debug_log "filed into mailbox $ name  because of extension";
  stop;

This is actually very effective: any time I register to a service, I try as much as possible to add a +extension that describe the service. Of course, spammers and marketers (it's the same really) are free to drop the extension and I suspect a lot of them do, but it helps with honest providers and this actually sorts a lot of stuff out of my inbox into topically-defined folders. It is also a security issue: someone could flood my filesystem with tons of mail folders, which would cripple the IMAP server and eat all the inodes, 4 times faster than just sending emails. But I guess I'll cross that bridge when I get there: anyone can flood my address and I have other mechanisms to deal with this. The trick is to then assign tags to all folders so that they appear in the Notmuch-emacs welcome view:

echo tagging folders
for folder in $(ls -ad $HOME/Maildir/$ PREFIX */   egrep -v "Maildir/$ PREFIX (feeds.* Sent.* INBOX/ INBOX/Sent)\$"); do
    tag=$(echo $folder   sed 's#/$##;s#^.*/##')
    notmuch tag +$tag -inbox tag:inbox and not tag:$tag and folder:$ PREFIX $tag
done

This is part of my notmuch-tag script that includes a lot more fine-tuned filtering, detailed below.

Automated reports filtering Another thing I get a lot of is machine-generated "spam". Well, it's not commercial spam, but it's a bunch of Nagios, cron jobs, and god knows what software thinks it's important to send me emails every day. I get a lot less of those these days since I'm off work at Koumbit, but still, those can be useful for others as well:

if anyof (exists "X-Cron-Env",
          header :contains ["subject"] ["security run output",
                                        "monthly run output",
                                        "daily run output",
                                        "weekly run output",
                                        "Debian Package Updates",
                                        "Debian package update",
                                        "daily mail stats",
                                        "Anacron job",
                                        "nagios",
                                        "changes report",
                                        "run output",
                                        "[Systraq]",
                                        "Undelivered mail",
                                        "Postfix SMTP server: errors from",
                                        "backupninja",
                                        "DenyHosts report",
                                        "Debian security status",
                                        "apt-listchanges"
                                        ],
           header :contains "Auto-Submitted" "auto-generated",
           envelope :contains "from" ["nagios@",
                                      "logcheck@"])
     
    fileinto "rapports";
 
# imported from procmail
elsif header :comparator "i;octet" :contains "Subject" "Cron"  
  if header :regex :comparator "i;octet"  "From" ".*root@"  
        fileinto "rapports";
   
 
elsif header :comparator "i;octet" :contains "To" "root@"  
  if header :regex :comparator "i;octet"  "Subject" "\\*\\*\\* SECURITY"  
        fileinto "rapports";
   
 
elsif header :contains "Precedence" "bulk"  
    fileinto "bulk";

Refiltering emails Of course, after all this I still had thousands of emails in my inbox, because the sieve filters apply only on new emails. The beauty of Sieve support in Dovecot is that there is a neat sieve-filter command that can reprocess an existing mailbox. That was a lifesaver. To run a specific sieve filter on a mailbox, I simply run:

sieve-filter .dovecot.sieve INBOX 2>&1   less

Well, this doesn't do anything. To really execute the filters, you need the -e flags, and to write to the INBOX for real, you need the -w flag as well, so the real run looks something more like this:

sieve-filter -e -W -v .dovecot.sieve INBOX > refilter.log 2>&1

The funky output redirects are necessary because this outputs a lot of crap. Also note that, unfortunately, the fake run output differs from the real run and is actually more verbose, which makes it really less useful than it could be.

Archival I also usually archive my mails every year, rotating my mailbox into an Archive.YYYY directory. For example, now all mails from 2015 are archived in a Archive.2015 directory. I used to do this with Mutt tagging and it was a little slow and error-prone. Now, i simply have this Sieve script:

require ["variables","date","fileinto","mailbox", "relational"];
# Extract date info
if currentdate :matches "year" "*"   set "year" "$ 1 ";  
if date :value "lt" :originalzone "date" "year" "$ year "  
  if date :matches "received" "year" "*"  
    # Archive Dovecot mailing list items by year and month.
    # Create folder when it does not exist.
    fileinto :create "Archive.$ 1 ";

I went from 15613 to 1040 emails in my real inbox with this process (including refiltering with the default filters as well).

Notmuch configuration My Notmuch configuration is a in three parts: I have small settings in ~/.notmuch-config. The gist of it is:

[new]
tags=unread;inbox;
ignore=
#[maildir]
# synchronize_flags=true
# tentative patch that was refused upstream
# http://mid.gmane.org/1310874973-28437-1-git-send-email-anarcat@koumbit.org
#reckless_trash=true
[search]
exclude_tags=deleted;spam;

I omitted the fairly trivial [user] section for privacy reasons and [database] for declutter. Then I have a notmuch-tag script symlinked into ~/Maildir/.notmuch/hooks/post-new. It does way too much stuff to describe in details here, but here are a few snippets:

if hostname   grep angela > /dev/null; then
    PREFIX=Anarcat/
else
    PREFIX=.
fi

This sets a variable that makes the script work on my laptop (angela), where mailboxes are in Maildir/Anarcat/foo or the server, where mailboxes are in Maildir/.foo. I also have special rules to tag my RSS feeds, which are generated by feed2imap, which is documented shortly below:

echo tagging feeds
( cd $HOME/Maildir/ && for feed in $ PREFIX feeds.*; do
    name=$(echo $feed   sed "s#$ PREFIX feeds\\.##")
    notmuch tag +feeds +$name -inbox folder:$feed and not tag:feeds
done )

Another example that would be useful is how to tag mailing lists, for example, this removes the inbox tag and adds the notmuch tags to emails from the notmuch mailing list.

notmuch tag +lists +notmuch      -inbox tag:inbox and "to:notmuch@notmuchmail.org"

Finally, I have a bunch of special keybindings in ~/.emacs.d/notmuch-config.el:

;; autocompletion
(eval-after-load "notmuch-address"
  '(progn
     (notmuch-address-message-insinuate)))
; use fortune for signature, config is in custom
(add-hook 'message-setup-hook 'fortune-to-signature)
; don't remember what that is
(add-hook 'notmuch-show-hook 'visual-line-mode)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; keymappings
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(define-key notmuch-show-mode-map "S"
  (lambda ()
    "mark message as spam and advance"
    (interactive)
    (notmuch-show-tag '("+spam" "-unread"))
    (notmuch-show-next-open-message-or-pop)))
(define-key notmuch-search-mode-map "S"
  (lambda (&optional beg end)
    "mark message as spam and advance"
    (interactive (notmuch-search-interactive-region))
    (notmuch-search-tag (list "+spam" "-unread") beg end)
    (anarcat/notmuch-search-next-message)))
(define-key notmuch-show-mode-map "H"
  (lambda ()
    "mark message as spam and advance"
    (interactive)
    (notmuch-show-tag '("-spam"))
    (notmuch-show-next-open-message-or-pop)))
(define-key notmuch-search-mode-map "H"
  (lambda (&optional beg end)
    "mark message as spam and advance"
    (interactive (notmuch-search-interactive-region))
    (notmuch-search-tag (list "-spam") beg end)
    (anarcat/notmuch-search-next-message)))
(define-key notmuch-search-mode-map "l" 
  (lambda (&optional beg end)
    "undelete and advance"
    (interactive (notmuch-search-interactive-region))
    (notmuch-search-tag (list "-unread") beg end)
    (anarcat/notmuch-search-next-message)))
(define-key notmuch-search-mode-map "u"
  (lambda (&optional beg end)
    "undelete and advance"
    (interactive (notmuch-search-interactive-region))
    (notmuch-search-tag (list "-deleted") beg end)
    (anarcat/notmuch-search-next-message)))
(define-key notmuch-search-mode-map "d"
  (lambda (&optional beg end)
    "delete and advance"
    (interactive (notmuch-search-interactive-region))
    (notmuch-search-tag (list "+deleted" "-unread") beg end)
    (anarcat/notmuch-search-next-message)))
(define-key notmuch-show-mode-map "d"
  (lambda ()
    "delete current message and advance"
    (interactive)
    (notmuch-show-tag '("+deleted" "-unread"))
    (notmuch-show-next-open-message-or-pop)))
;; https://notmuchmail.org/emacstips/#index17h2
(define-key notmuch-show-mode-map "b"
  (lambda (&optional address)
    "Bounce the current message."
    (interactive "sBounce To: ")
    (notmuch-show-view-raw-message)
    (message-resend address)
    (kill-buffer)))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; my custom notmuch functions
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun anarcat/notmuch-search-next-thread ()
  "Skip to next message from region or point
This is necessary because notmuch-search-next-thread just starts
from point, whereas it seems to me more logical to start from the
end of the region."
  ;; move line before the end of region if there is one
  (unless (= beg end)
    (goto-char (- end 1)))
  (notmuch-search-next-thread))
;; Linking to notmuch messages from org-mode
;; https://notmuchmail.org/emacstips/#index23h2
(require 'org-notmuch nil t)
(message "anarcat's custom notmuch config loaded")

This is way too long: in my opinion, a bunch of that stuff should be factored in upstream, but some features have been hard to get in. For example, Notmuch is really hesitant in marking emails as deleted. The community is also very strict about having unit tests for everything, which makes writing new patches a significant challenge for a newcomer, which will often need to be familiar with both Elisp and C. So for now I just have those configs that I carry around. Emails marked as deleted or spam are processed with the following script named notmuch-purge which I symlink to ~/Maildir/.notmuch/hooks/pre-new:

#!/bin/sh
if hostname   grep angela > /dev/null; then
    PREFIX=Anarcat/
else
    PREFIX=.
fi
echo moving tagged spam to the junk folder
notmuch search --output=files tag:spam \
        and not folder:$ PREFIX junk \
        and not folder:$ PREFIX greyspam \
        and not folder:Koumbit/INBOX \
        and not path:Koumbit/** \
      while read file; do
          mv "$file" "$HOME/Maildir/$ PREFIX junk/cur"
      done
echo unconditionnally deleting deleted mails
notmuch search --output=files tag:deleted   xargs -r rm

Oh, and there's also customization for Notmuch:

;; -*- mode: emacs-lisp; auto-recompile: t; -*-
(custom-set-variables
 ;; from https://anarc.at/sigs.fortune
 '(fortune-file "/home/anarcat/.mutt/sigs.fortune")
 '(message-send-hook (quote (notmuch-message-mark-replied)))
 '(notmuch-address-command "notmuch-address")
 '(notmuch-always-prompt-for-sender t)
 '(notmuch-crypto-process-mime t)
 '(notmuch-fcc-dirs
   (quote
    ((".*@koumbit.org" . "Koumbit/INBOX.Sent")
     (".*" . "Anarcat/Sent"))))
 '(notmuch-hello-tag-list-make-query "tag:unread")
 '(notmuch-message-headers (quote ("Subject" "To" "Cc" "Bcc" "Date" "Reply-To")))
 '(notmuch-saved-searches
   (quote
    ((:name "inbox" :query "tag:inbox and not tag:koumbit and not tag:rt")
     (:name "unread inbox" :query "tag:inbox and tag:unread")
     (:name "unread" :query "tag:unred")
     (:name "freshports" :query "tag:freshports and tag:unread")
     (:name "rapports" :query "tag:rapports and tag:unread")
     (:name "sent" :query "tag:sent")
     (:name "drafts" :query "tag:draft"))))
 '(notmuch-search-line-faces
   (quote
    (("deleted" :foreground "red")
     ("unread" :weight bold)
     ("flagged" :foreground "blue"))))/
 '(notmuch-search-oldest-first nil)
 '(notmuch-show-all-multipart/alternative-parts nil)
 '(notmuch-show-all-tags-list t)
 '(notmuch-show-insert-text/plain-hook
   (quote
    (notmuch-wash-convert-inline-patch-to-part notmuch-wash-tidy-citations notmuch-wash-elide-blank-lines notmuch-wash-excerpt-citations)))
 )

I think that covers it.

Offlineimap So of course the above works well on the server directly, but how do run Notmuch on a remote machine that doesn't have access to the mail spool directly? This is where OfflineIMAP comes in. It allows me to incrementally synchronize a local Maildir folder hierarchy with a a remote IMAP server. I am assuming you already have an IMAP server configured, since you already configured Sieve above. Note that other synchronization tools exist. The other popular one is isync but I had trouble migrating to it (see courriels for details) so for now I am sticking with OfflineIMAP. The configuration is fairly simple:

[general]
accounts = Anarcat
ui = Blinkenlights
maxsyncaccounts = 3
[Account Anarcat]
localrepository = LocalAnarcat
remoterepository = RemoteAnarcat
# refresh all mailboxes every 10 minutes
autorefresh = 10
# run notmuch after refresh
postsynchook = notmuch new
# sync only mailboxes that changed
quick = -1
## possible optimisation: ignore mails older than a year
#maxage = 365
# local mailbox location
[Repository LocalAnarcat]
type = Maildir
localfolders = ~/Maildir/Anarcat/
# remote IMAP server
[Repository RemoteAnarcat]
type = IMAP
remoteuser = anarcat
remotehost = anarc.at
ssl = yes
# without this, the cert is not verified (!)
sslcacertfile = /etc/ssl/certs/DST_Root_CA_X3.pem
# do not sync archives
folderfilter = lambda foldername: not re.search('(Sent\.20[01][0-9]\..*)', foldername) and not re.search('(Archive.*)', foldername)
# and only subscribed folders
subscribedonly = yes
# don't reconnect all the time
holdconnectionopen = yes
# get mails from INBOX immediately, doesn't trigger postsynchook
idlefolders = ['INBOX']

Critical parts are:

postsynchook: obviously, we want to run notmuch after fetching mail
idlefolders: receives emails immediately without waiting for the longer autorefresh delay, which means that most mailboxes don't see new emails until 10 minutes in the worst case. unfortunately, doesn't run the postsynchook so I need to hit G in Emacs to see new mail
quick=-1, subscribedonly, holdconnectionopen: makes most runs much, much faster as it skips unchanged or unsubscribed folders and keeps the connection to the server

The other settings should be self-explanatory.

RSS feeds I gave up on RSS readers, or more precisely, I merged RSS feeds and email. The first time I heard of this, it sounded like a horrible idea, because it means yet more emails! But with proper filtering, it's actually a really nice way to process emails, since it leverages the distributed nature of email. For this I use a fairly standard feed2imap, although I do not deliver to an IMAP server, but straight to a local Maildir. The configuration looks like this:

---
include-images: true
target-refix: &target "maildir:///home/anarcat/Maildir/.feeds."
feeds:
- name: Planet Debian
  url: http://planet.debian.org/rss20.xml
  target: [ *target, 'debian-planet' ]

I have obviously more feeds, the above is just and example. This will deliver the feeds as emails in one mailbox per feed, in ~/Maildir/.feeds.debian-planet, in the above example.

Troubleshooting You will fail at writing the sieve filters correctly, and mail will (hopefully?) fall through to your regular mailbox. Syslog will tell you things fail, as expected, and details are in your .dovecot.sieve.log file in your home directory. I also enabled debugging on the Sieve module

--- a/dovecot/conf.d/90-sieve.conf
+++ b/dovecot/conf.d/90-sieve.conf
@@ -51,6 +51,7 @@ plugin  
        # deprecated imapflags extension in addition to all extensions were already
   # enabled by default.
   #sieve_extensions = +notify +imapflags
+  sieve_extensions = +vnd.dovecot.debug
   # Which Sieve language extensions are ONLY available in global scripts. This
   # can be used to restrict the use of certain Sieve extensions to administrator

This allowed me to use debug_log function in the rulesets to output stuff directly to the logfile.

Further improvements Of course, this is all done on the commandline, but that is somewhat expected if you are already running Notmuch. Of course, it would be much easier to edit those filters through a GUI. Roundcube has a nice Sieve plugin, and Thunderbird also has such a plugin as well. Since Sieve is a standard, there's a bunch of clients available. All those need you to setup some sort of thing on the server, which I didn't bother doing yet. And of course, a key improvement would be to have Notmuch synchronize its state better with the mailboxes directly, instead of having the `notmuch-purge` hack above. Dovecot and Maildir formats support up to 26 flags, and there were discussions about using those flags to synchronize with notmuch tags so that multiple notmuch clients can see the same tags on different machines transparently. This, however, won't make Notmuch work on my phone or webmail or any other more generic client: for that, Sieve rules are still very useful. I still don't have webmail setup at all: so to read email, I need an actual client, which is currently my phone, which means I need to have Wifi access to read email. "Internet Caf s" or "this guy's computer" won't work as well, although I can always use `ssh` to login straight to the server and read mails with Mutt. I am also considering using X509 client certificates to authenticate to the mail server without a passphrase. This involves configuring Postfix, which seems simple enough. Dovecot's configuration seems a little more involved and less well documented. It seems that both [OfflimeIMAP][] and K-9 mail support client-side certs. OfflineIMAP prompts me for the password so it doesn't get leaked anywhere. I am a little concerned about building yet another CA, but I guess it would not be so hard... The server side of things needs more documenting, particularly the spam filters. This is currently spread around this wiki, mostly in configuration.

Security considerations The whole purpose of this was to make it easier to read my mail on other devices. This introduces a new vulnerability: someone may steal that device or compromise it to read my mail, impersonate me on different services and even get a shell on the remote server. Thanks to the two-factor authentication I setup on the server, I feel a little more confident that just getting the passphrase to the mail account isn't sufficient anymore in leveraging shell access. It also allows me to login with `ssh` on the server without trusting the machine too much, although that only goes so far... Of course, `sudo` is then out of the question and I must assume that everything I see is also seen by the attacker, which can also inject keystrokes and do all sorts of nasty things. Since I also connected my email account on my phone, someone could steal the phone and start impersonating me. The mitigation here is that there is a PIN for the screen lock, and the phone is encrypted. Encryption isn't so great when the passphrase is a PIN, but I'm working on having a better key that is required on reboot, and the phone shuts down after 5 failed attempts. This is documented in my phone setup. Client-side X509 certificates further mitigates those kind of compromises, as the X509 certificate won't give shell access. Basically, if the phone is lost, all hell breaks loose: I need to change the email password (or revoke the certificate), as I assume the account is about to be compromised. I do not trust Android security to give me protection indefinitely. In fact, one could argue that the phone is already compromised and putting the password there already enabled a possible state-sponsored attacker to hijack my email address. This is why I have an OpenPGP key on my laptop to authenticate myself for critical operations like code signatures. The risk of identity theft from the state is, after all, a tautology: the state is the primary owner of identities, some could say by definition. So if a state-sponsored attacker would like to masquerade as me, they could simply issue a passport under my name and join a OpenPGP key signing party, and we'd have other problems to deal with, namely, proper infiltration counter-measures and counter-snitching.

Recent article by professors Karim Lakhani and Marco Iansiti on the Harvard Business Review, Digital Ubiquity: How Connection, Sensors and Data are Revolutionizing Business , gave me the opportunity for interesting insights and considerations.

Digital technology evolution and the development of modern Internet of Things devices are introducing huge transformative effects within social inter-relationships and its business models. These effects can not be ignored if we want to perceive with the right clarity and meaning the innovation process that inevitably comes with it.

The three fundamental properties of digital technology

What happened in the reproducible builds effort between December 27th and January 2nd: Infrastructure dak now silently accepts and discards .buildinfo files (commit 1, 2), thanks to Niels Thykier and Ansgar Burchardt. This was later confirmed as working by Mattia Rizzolo. Packages fixed The following packages have become reproducible due to changes in their build dependencies: banshee-community-extensions, javamail, mono-debugger-libs, python-avro. The following packages became reproducible after getting fixed:

avrdude/6.2-5 by Milan Kupcevic.
blosxom/2.1.2-2 uploaded by Rhonda D'Vine, original patches (#777292, #793001) by Chris Lamb and akira.
buzztrax/0.10.2-2 uploaded by Sebastian Dr ge, original patch by Chris Lamb, fixed upstream.
dx/1:4.4.4-8 by Graham Inggs.
gap-guava/3.12+ds1-3 by Jerome Benoit.
goffice/0.10.26-1 uploaded by Dmitry Smirnov, fixed upstream.
gunroar/0.15.dfsg1-8 uploaded by Markus Koschany, original patch by Reiner Herrmann.
iceweasel/43.0.2-1 by Mike Hommey.
ii-esu/1.0a.dfsg1-7 uploaded by Markus Koschany, original patch by Reiner Herrmann.
jing-trang/20131210+dfsg+1-4 by Samuel Thibault.
mstflint/4.1.0+1.46.gb1cdaf7-1 by Mehdi Dogguy.
mu-cade/0.11.dfsg1-9 uploaded by Markus Koschany, original patch by Reiner Herrmann.
mumble/1.2.12-1 by Christopher Knadle.
netris/0.52-10 by Rhonda D'Vine, original patches (#778201, #793707) by Chris Lamb and akira.
onboard/1.1.2-2 uploaded by Mike Gabriel, original patch by Reiner Herrmann.
parsec47/0.2.dfsg1-7 uploaded by Markus Koschany, original patch by Reiner Herrmann.
pathological/1.1.3-14 by Markus Koschany.
projectl/1.001.dfsg1-8 uploaded by Markus Koschany, original patch by Reiner Herrmann.
re2c/0.15.3-1 by JCF Ploemen.
s3d/0.2.2-14 by Sven Eckelmann.
tulip/4.8.0dfsg-2 by Yann Dirson.
val-and-rick/0.1a.dfsg1-5 uploaded by Markus Koschany, original patch by Reiner Herrmann.
xterm/321-1 by Sven Joachim.

Some uploads fixed some reproducibility issues, but not all of them:

debian-installer/20160101 uploaded by Cyril Brulebois with several fixes from Steven Chamberlain.
drbd-utils/8.9.5-1 by Apollon Oikonomopoulos.
hhvm/3.11.0+dfsg-1 by Faidon Liambotis.
rkward/0.6.4-1 uploaded by Thomas Friedrichsmeier, original patch by Philip Rinn.
rsbackup/3.0-2 uploaded by Matthew Vernon, original patches (#777394, #793716) by Chris Lamb and akira.
tin/1:2.3.2-1 by Marco d'Itri.
transdecoder/2.0.1+dfsg-2 uploaded by Andreas Tille, original patch by Chris Lamb.
tumiki-fighters/0.2.dfsg1-7 uploaded by Markus Koschany, original patch by Reiner Herrmann.

Untested changes:

fltk1.1/1.1.10-20 by Aaron M. Ucko, currently FTBFS.
fltk1.3/1.3.3-5 by Aaron M. Ucko, currently FTBFS.

reproducible.debian.net The testing distribution (the upcoming stretch) is now tested on armhf. (h01ger) Four new armhf build nodes provided by Vagrant Cascandian were integrated in the infrastructer. This allowed for 9 new armhf builder jobs. (h01ger) The RPM-based build system, koji, is now in unstable and testing. (Marek Marczykowski-G recki, Ximin Luo). Package reviews 131 reviews have been removed, 71 added and 53 updated in the previous week. 58 new FTBFS reports were made by Chris Lamb and Chris West. New issues identified this week: nondeterminstic_ordering_in_gsettings_glib_enums_xml, nondeterminstic_output_in_warnings_generated_by_breathe, qt_translate_noop_nondeterminstic_ordering. Misc. Steven Chamberlain explained in length why reproducible cross-building across architectures mattered, and posted results of his tests comparing a stage1 debootstrapped chroot of linux-i386 once done from official Debian packages, the others cross-built from kfreebsd-amd64.

What happened in the reproducible builds effort between December 20th to December 26th: Toolchain fixes Mattia Rizzolo rebased our experimental versions of debhelper (twice!) and dpkg on top of the latest releases. Reiner Herrmann submited a patch for mozilla-devscripts to sort the file list in generated preferences.js files. To be able to lift the restriction that packages must be built in the same path, translation support for the __FILE__ C pre-processor macro would also be required. Joerg Sonnenberger submitted a patch back in 2010 that would still be useful today. Chris Lamb started work on providing a deterministic mode for debootstrap. Packages fixed The following packages have become reproducible due to changes in their build dependencies: bouncycastle, cairo-dock-plug-ins, darktable, gshare, libgpod, pafy, ruby-redis-namespace, ruby-rouge, sparkleshare. The following packages became reproducible after getting fixed:

a7xpg/0.11.dfsg1-9 uploaded by Markus Koschany, original patch by Reiner Herrmann.
at/3.1.18-1 uploaded by Laurent Bigonville, original patch by Reiner Herrmann, merged by Jose M Calhariz.
bibtool/2.61+ds-2 by Jerome Benoit.
bup/0.27-2 uploaded by Robert Edmonds, original patch by Chris Lamb.
deja-dup/34.1-1 uploaded by Laurent Bigonville, original patch by Reiner Herrmann.
gauche-gl/0.6-1 by NIIBE Yutaka.
ifupdown/0.8 uploaded by Guus Sliepen, original patch by Lunar.
jing-trang/20131210+dfsg+1-4 by Samuel Thibault.
libp11/0.3.0-2 by Eric Dorland.
pdns/4.0.0~alpha1-1 by Christian Hofstaedtler.
pdns-recursor/4.0.0~alpha1-1 by Christian Hofstaedtler.
qupzilla/1.8.9~dfsg1-1 uploaded by Georges Khaznadar, fixed upstream.
ros-genpy/0.5.7-4 uploaded by Jochen Sprickerhof, original patch by Chris Lamb.
signify/1.14-3 by Mattia Rizzolo, obsoleting patches submitted by Chris Lamb and akira.
sleepyhead/0.9.8-2 by Sergio Durigan Junior.
texi2html/1.82+dfsg1-5 by Mattia Rizzolo, previous patch by Juan Picca.
titanion/0.3.dfsg1-6 by Markus Koschany, original patch by Reiner Herrmann.
tj3/3.5.0-3 uploaded by Vincent Bernat, original patch by Vincent Bernat.
vcsh/1.20151229-1 by Richard Hartmann.
waitress/0.8.10-1 uploaded by Andrew Shadura, original patch by Juan Picca.
xtel/3.3.0-19 by Samuel Thibault.

Some uploads fixed some reproducibility issues, but not all of them:

kmod/22-1 uploaded by Marco d'Itri, original patch by Lunar.
libgcrypt20/1.6.4-4 by Andreas Metzler.
loadlin/1.6f-4 uploaded by Samuel Thibault, original patch by Chris Lamb.
pathological/1.1.3-13 uploaded by Markus Koschany, original patch by Chris Lamb.
yacas/1.3.6-1) uploaded by Muammar El Khatib, original patch by Reiner Herrmann.

Patches submitted which have not made their way to the archive yet:

#808459 on pywavelets by Chris Lamb: add support for SOURCE_DATE_EPOCH in the documentation generator.
#808652 on nexuiz-data by Reiner Herrmann: sorts with the locale set to C.
#808667 on libmouse-perl by Reiner Herrmann: sorts the list of filenames to be embedded.
#808679 on libcorelinux by Reiner Herrmann: sort the list of files in the generated Makefile.
#808711 on ca-certificates by Reiner Herrmann: sort the list of certificates before it is embedded.

reproducible.debian.net Statistics for package sets are now visible for the armhf architecture. (h01ger) The second build now has a longer timeout (18 hours) than the first build (12 hours). This should prevent wasting resources when a machine is loaded. (h01ger) Builds of Arch Linux packages are now done using a tmpfs. (h01ger) 200 GiB have been added to jenkins.debian.net (thanks to ProfitBricks!) to make room for new jobs. The current count is at 962 and growing! diffoscope development Aside from some minor bugs that have been fixed, a one-line change made huge memory (and time) savings as the output of transformation tool is now streamed line by line instead of loaded entirely in memory at once. disorderfs development Andrew Ayer released disorderfs version 0.4.2-1 on December 22th. It fixes a memory corruption error when processing command line arguments that could cause command line options to be ignored. Documentation update Many small improvements for the documentation on reproducible-builds.org sent by Georg Koppen were merged. Package reviews 666 (!) reviews have been removed, 189 added and 162 updated in the previous week. 151 new fail to build from source reports have been made by Chris West, Chris Lamb, Mattia Rizzolo, and Niko Tyni. New issues identified: unsorted_filelist_in_xul_ext_preferences, nondeterminstic_output_generated_by_moarvm. Misc. Steven Chamberlain drew our attention to one analysis of the Juniper ScreenOS Authentication Backdoor: Whilst this may have been added in source code, it was well-disguised in the disassembly and just 7 instructions long. I thought this was a good example of the current state-of-the-art, and why we'd like our binaries and eventually, installer and VM images reproducible IMHO. Joanna Rutkowska has mentioned possible ways for Qubes to become reproducible on their development mailing-list.

This article documents how the traffic of specific Linux processes can be subjected to a custom firewall or routing configuration, thanks to the magic of cgroups. We will use the Network classifier cgroup, which allows tagging the packets sent by specific processes. To create the cgroup which will be used to identify the processes I added something like this to /etc/rc.local:

mkdir /sys/fs/cgroup/net_cls/unlocator
/bin/echo 42 > /sys/fs/cgroup/net_cls/unlocator/net_cls.classid
chown md: /sys/fs/cgroup/net_cls/unlocator/tasks

The tasks file, which controls the membership of processes in a cgroup, is made writeable by my user: this way I can add new processes without becoming root. 42 is the arbitrary class identifier that the kernel will associate with the packets generated by the member processes. A command like systemd-cgls /sys/fs/cgroup/net_cls/ can be used to explore which processes are in which cgroup. I use a simple shell wrapper to start a shell or a new program as members of this cgroup:

#!/bin/sh -e
CGROUP_NAME=unlocator
if [ ! -d /sys/fs/cgroup/net_cls/$CGROUP_NAME/ ]; then
  echo "The $CGROUP_NAME net_cls cgroup does not exist!" >&2
  exit 1
fi
/bin/echo $$ > /sys/fs/cgroup/net_cls/$CGROUP_NAME/tasks
if [ $# = 0 ]; then
  exec $ SHELL:-/bin/sh 
fi
exec "$@"

My first goal is to use a special name server for the DNS queries of some processes, thanks to a second dnsmasq process which acts as a caching forwarder. /etc/dnsmasq2.conf:

port=5354
listen-address=127.0.0.1
bind-interfaces
no-dhcp-interface=*
no-hosts
no-resolv
server=185.37.37.37
server=185.37.37.185

/etc/systemd/system/dnsmasq2.service:

[Unit]
Description=dnsmasq - Second instance
Requires=network.target
[Service]
ExecStartPre=/usr/sbin/dnsmasq --test
ExecStart=/usr/sbin/dnsmasq --keep-in-foreground --conf-file=/etc/dnsmasq2.conf
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/run/dnsmasq/dnsmasq.pid
[Install]
WantedBy=multi-user.target

Do not forget to enable the new service:

systemctl enable dnsmasq2
systemctl start dnsmasq2

Since the cgroup match extension is not yet available in a released version of iptables, you will first need to build and install it manually:

git clone git://git.netfilter.org/iptables.git
cd iptables
./autogen.sh
./configure
make -k
sudo cp extensions/libxt_cgroup.so /lib/xtables/
sudo chmod -x /lib/xtables/libxt_cgroup.so

The netfilter configuration required is very simple: all DNS traffic from the marked processes is redirected to the port of the local dnsmasq2:

iptables -t nat -A OUTPUT -m cgroup --cgroup 42 -p udp --dport 53 -j REDIRECT --to-ports 5354
iptables -t nat -A OUTPUT -m cgroup --cgroup 42 -p tcp --dport 53 -j REDIRECT --to-ports 5354

For related reasons, I also need to disable IPv6 for these processes:

ip6tables -A OUTPUT -m cgroup --cgroup 42 -j REJECT

I use a different cgroup to force some programs to use my office VPN by first setting a netfilter packet mark on their traffic:

iptables -t mangle -A OUTPUT -m cgroup --cgroup 43 -j MARK --set-mark 43

The packet mark is then used to policy-route this traffic using a dedicate VRF, i.e. routing table 43:

ip rule add fwmark 43 table 43

This VPN VRF just contains a default route for the VPN interface:

ip route add default dev tun0 table 43

Depending on your local configuration it may be a good idea to also add to the VPN VRF the routes of your local interfaces:

ip route show scope link proto kernel \
    xargs -I ROUTE ip route add ROUTE table 43

Since the source address selection happens before the traffic is diverted to the VPN, we also need to source-NAT to the VPN address the marked packets:

iptables -t nat -A POSTROUTING -m mark --mark 43 --out-interface tun0 -j MASQUERADE

I praise etckeeper enough. That you use or not formal configuration management tools, you should have etckeeper installed, even if to do ad-hoc changes in an emergency. For example, I just thought I could use it to figure out which Debian release I originally installed on this machine, and when I did the upgrades. Turns out it's amazingly easy:

cd /etc
sudo git log --oneline -p --no-prefix  /etc/issue

Here is the result here:

root@marcos:/etc# git log --oneline -p --no-prefix --pretty=format:'%s (%ai)' /etc/issue   cat
Initial commit (2011-02-23 00:14:08 -0500)
diff --git issue issue
new file mode 100644
index 0000000..9d52ed2
--- /dev/null
+++ issue
@@ -0,0 +1,2 @@
+Debian GNU/Linux 6.0 \n \l
+
saving uncommitted changes in /etc prior to apt run (2011-05-22 19:08:20 -0400)
diff --git issue issue
index 9d52ed2..647d490 100644
--- issue
+++ issue
@@ -1,2 +1,2 @@
-Debian GNU/Linux 6.0 \n \l
+Debian GNU/Linux wheezy/sid \n \l
committing changes in /etc after apt run (2013-01-23 09:40:27 -0500)
diff --git issue issue
index 647d490..d363ace 100644
--- issue
+++ issue
@@ -1,2 +1,2 @@
-Debian GNU/Linux wheezy/sid \n \l
+Debian GNU/Linux 7.0 \n \l
committing changes in /etc after apt run (2013-06-15 13:02:04 -0400)
diff --git issue issue
index d363ace..efc8255 100644
--- issue
+++ issue
@@ -1,2 +1,2 @@
-Debian GNU/Linux 7.0 \n \l
+Debian GNU/Linux 7 \n \l
committing changes in /etc after apt run (2014-02-02 23:28:12 -0500)
diff --git issue issue
index efc8255..e65d112 100644
--- issue
+++ issue
@@ -1,2 +1,2 @@
-Debian GNU/Linux 7 \n \l
+Debian GNU/Linux jessie/sid \n \l
committing changes in /etc after apt run (2014-12-18 11:32:43 -0500)
diff --git issue issue
index e65d112..6478eed 100644
--- issue
+++ issue
@@ -1,2 +1,2 @@
-Debian GNU/Linux jessie/sid \n \l
+Debian GNU/Linux 8 \n \l

In other words:

2011-02-23: installed etckeeper, running Debian 6.0 ("squeeze", fresh from the stable release from 2011-02-06)
2011-05-22: upgraded to wheezy/sid ("testing", at that point)
2013-01-23: some upgrade to testing, unclear (although wheezy is in freeze since 2012-06-30 at that point, it seems the version string changed then in base-files 7)
2013-06-15: wheezy 7.1 point release drops the digit
2014-02-02: switch to jessie/sid (back to "testing", a few months after the freeze is announced)
2014-12-18: upgrade to Debian 8 (one month after the freeze, still not released)

Unfortunately, there isn't much more details about the exact upgrade points, especially since /etc/os-release is a symlink starting with Jessie. Besides things are much more in flux than we would like to believe, especially when you run a rolling distribution like "testing", but it still gives a good idea of my upgrade history:

i ran 3 different major releases of Debian on this machine, always upgrading from the previous one without reinstalling
i probably installed etckeeper from the start, as configuration mentions the machine was online on 2011-03-08
i generally run stable until i get tired and upgrade to testing, generally at some point close to the freeze time

There is of course more information available directly in git log, namely the exact package version changes. With some more commandline filtering, we can see exactly when each upgrade was done, including minor releases:

# git log --date iso --grep base-files --reverse   egrep "^commit ^Date base-files"
commit 34ad962ff10c6e4e201378698e0fe0d4b03c8c39
Date:   2011-04-02 22:21:11 -0400
    -base-files 6.0
    +base-files 6.0squeeze1
commit 1cba7d2e097091e86eba1a2d8e4f5a9771e746a1
Date:   2011-07-09 19:40:05 -0400
    -base-files 6.3
    +base-files 6.4
commit cb409a0fbe2f95c3cd6a7c1ff0af263b55c7d597
Date:   2011-09-28 17:54:57 -0400
    -base-files 6.4
    +base-files 6.5
commit cf6d9dab6f79f4b50e5bd80dcba1769b0aa6c84b
Date:   2012-03-25 19:18:44 -0400
    -base-files 6.5
    +base-files 6.7
commit bb1b6aab4406276388542cefd7b4eff92d960533
Date:   2012-06-29 00:42:32 -0400
    -base-files 6.7
    +base-files 6.9
commit c6d9218ba75b3276ea44d949ef3410c35713d487
Date:   2012-09-29 14:25:01 -0400
    -base-files 6.9
    +base-files 6.11
commit bd730398e572c8403b2b9c0421df64a407669877
Date:   2013-01-23 09:40:27 -0500
    -base-files 6.11
    +base-files 7.1
commit af42616d72c4fe5c4a0e43ee8948031732735ec5
Date:   2013-06-15 13:02:04 -0400
    -base-files 7.1
    +base-files 7.1wheezy1
commit d2ef5df9c689073b62b4898a12e42bb8488c8cdc
Date:   2013-10-14 18:15:11 -0400
    -base-files 7.1wheezy1
    +base-files 7.1wheezy2
commit 104ea49559eb2b6b2aff21817b3980b274882a28
Date:   2013-12-14 11:07:09 -0500
    -base-files 7.1wheezy2
    +base-files 7.1wheezy3
commit bfa19b7ad737cb22c495af9429e922e6ec46202d
Date:   2014-02-02 23:28:12 -0500
    -base-files 7.1wheezy3
    +base-files 7.2
commit 830570ca4083af0c4b87c6c7c746c9513e260cf7
Date:   2014-05-15 11:20:13 -0400
    -base-files 7.2
    +base-files 7.3
commit 4c9e6ca21bb8189810199e6d518250432a62391d
Date:   2014-07-27 19:50:14 -0400
    -base-files 7.3
    +base-files 7.5
commit 4bc6abe2ba193cb6fac4eae00fa855eb32b86400
Date:   2014-10-20 16:14:09 -0400
    -base-files 7.5
    +base-files 7.6
commit a5e2ce476982201338220a314b1f9ccb03c99517
Date:   2014-11-28 19:45:56 -0500
    -base-files 7.6
    +base-files 7.10
commit eaa69b112fee60ef71938d9a725c07f9f29b2011
Date:   2014-12-18 11:32:43 -0500
    -base-files 7.10
    +base-files 8
commit f95a199df77030dc9ee6ab55bf4fa246fa88c959
Date:   2015-07-24 12:34:04 -0400
    -base-files 8
    +base-files 8+deb8u1
commit db4ab55b7327cecea54c5fe6a65560ba0e385978
Date:   2015-09-07 19:09:47 -0400
    -base-files 8+deb8u1
    +base-files 8+deb8u2

Just came back from a wonderful time hiking with my daughter at the finale Hike the Greenbelt event at the Backlands/McIntosh Run! This photo slideshow is from my HP snapshot camera which unfortunately has dust in the lens. But I felt it was more important to get them up soon than to get them perfect, so please click the photo below to start the slideshow. I want to thank everyone who made this event a success, and not going to name specific names as I m sure I ll miss someone important. But thanks especially to Martin, the leader of our group. [slb_group]

Marcos Zentilli explaining geology of the region. Click photo to start slideshow.

Slighthly delayed, but here are the stats for week 5 of the DUCK challenge:

Emmanuel Bourg uploaded gant
Marco d'Itri uploaded libsystemd-dummy
Christoph Egger uploaded sbcl
Sandro Tosi uploaded basemap
Hilko Bengen uploaded libbde
Bas Couwenberg uploaded gpx2shp
Michael Stapelberg uploaded dh-make-golang
Jackson Doak uploaded pyicu
Peter Pentchev uploaded stdsyslog
Gianfranco Costamagna uploaded cld2

So we had 10 packages fixed and uploaded by 10 different uploaders. A big "Thank You" to you!! Since the start of this challenge, a total of 59 packages, were fixed. Here is a quick overview:

	Week 1	Week 2	Week 3	Week 4	Week 5	Week 6	Week 7
# Packages	10	15	10	14	10	-	-
Total	10	25	35	49	59	-	-

The list of the fixed and updated packages is availabe here. I will try to update this ~daily. If I missed one of your uploads, please drop me a line. Only 2 more weeks to DebConf15 so please get involved: The DUCK Challenge is running until end of DebConf15! Pevious articles are here: Week 1, Week 2, Week 3, Week 4.

For the 6th year in a row, Igalia has organized a hackfest focused on web engines. The 5 years before this one were actually focused on the GTK+ port of WebKit, but the number of web engines that matter to us as Free Software developers and consultancies has grown, and so has the scope of the hackfest. It was a very productive and exciting event. It has already been covered by Manuel Rego, Philippe Normand, Sebastian Dr ge and Andy Wingo! I am sure more blog posts will pop up. We had Martin Robinson telling us about the new Servo engine that Mozilla has been developing as a proof of concept for both Rust as a language for building big, complex products and for doing layout in parallel. Andy gave us a very good summary of where JS engines are in terms of performance and features. We had talks about CSS grid layouts, TyGL a GL-powered implementation of the 2D painting backend in WebKit, the new Wayland port, announced by Zan Dobersek, and a lot more. With help from my colleague ChangSeok OH, I presented a description of how a team at Collabora led by Marco Barisione made the combination of WebKitGTK+ and GNOME s web browser a pretty good experience for the Raspberry Pi. It took a not so small amount of both pragmatic limitations and hacks to get to a multi-tab browser that can play youtube videos and be quite responsive, but we were very happy with how well WebKitGTK+ worked as a base for that. One of my main goals for the hackfest was to help drive features that were lingering in the bug tracker for WebKitGTK+. I picked up a patch that had gone through a number of iterations and rewrites: the HTML5 notifications support, and with help from Carlos Garcia, managed to finish it and land it at the last day of the hackfest! It provides new signals that can be used to authorize notifications, show and close them. To make notifications work in the best case scenario, the only thing that the API user needs to do is handle the permission request, since we provide a default implementation for the show and close signals that uses libnotify if it is available when building WebKitGTK+. Originally our intention was to use GNotification for the default implementation of those signals in WebKitGTK+, but it turned out to be a pain to use for our purposes. GNotification is tied to GApplication. This allows for some interesting features, like notifications being persistent and able to reactivate the application, but those make no sense in our current use case, although that may change once service workers become a thing. It can also be a bit problematic given we are a library and thus have no GApplication of our own. That was easily overcome by using the default GApplication of the process for notifications, though. The show stopper for us using GNotification was the way GNOME Shell currently deals with notifications sent using this mechanism. It will look for a .desktop file named after the application ID used to initialize the GApplication instance and reject the notification if it cannot find that. Besides making this a pain to test our test browser would need a .desktop file to be installed, that would not work for our main API user! The application ID used for all Web instances is org.gnome.Epiphany at the moment, and that is not the same as any of the desktop files used either by the main browser or by the web apps created with it. For the future we will probably move Epiphany towards this new era, and all users of the WebKitGTK+ API as well, but the strictness of GNOME Shell would hurt the usefulness of our default implementation right now, so we decided to stick to libnotify for the time being. Other than that, I managed to review a bunch of patches during the hackfest, and took part in many interesting discussions regarding the next steps for GNOME Web and the GTK+ and Wayland ports of WebKit, such as the potential introduction of a threaded compositor, which is pretty exciting. We also tried to have Bastien Nocera as a guest participant for one of our sessions, but it turns out that requires more than a notebook on top of a bench hooked up to a TV to work well. We could think of something next time ;D. I d like to thank Igalia for organizing and sponsoring the event, Collabora for sponsoring and sending ChangSeok and myself over to Spain from far away Brazil and South Korea, and Adobe for also sponsoring the event! Hope to see you all next year!

Web Engines Hackfest 2014 sponsors: Adobe, Collabora and Igalia

after see attic introduced in a discussion about bup, i figured out i could give it a try. it was answering two of my biggest concerns with bup:

backup removal
encryption

and seemed to magically out of nowhere and basically do everything i need, with an inline manual on top of it.

disclaimer Note: this is not a real benchmark! i would probably need to port bup and attic to liw's seivot software to report on this properly (and that would amazing and really interesting, but it's late now). even worse, this was done on a production server with other stuff going on so take results with a grain of salt.

procedure and results Here's what I did. I setup backups of my ridiculously huge ~/src directory on the external hard drive where I usually make my backups. I ran a clean backup with attic, than redid it, then I ran a similar backup with bup, then redid it. Here are the results:

anarcat@marcos:~$ sudo apt-get install attic # this installed 0.13 on debian jessie amd64
[...]
anarcat@marcos:~$ attic init /mnt/attic-test:
Initializing repository at "/media/anarcat/calyx/attic-test"
Encryption NOT enabled.
Use the "--encryption=passphrase keyfile" to enable encryption.
anarcat@marcos:~$ time attic create --stats /mnt/attic-test::src ~/src/
Initializing cache...
------------------------------------------------------------------------------
Archive name: src
Archive fingerprint: 7bdcea8a101dc233d7c122e3f69e67e5b03dbb62596d0b70f5b0759d446d9ed0
Start time: Tue Nov 18 00:42:52 2014
End time: Tue Nov 18 00:54:00 2014
Duration: 11 minutes 8.26 seconds
Number of files: 283910
                       Original size      Compressed size    Deduplicated size
This archive:                6.74 GB              4.27 GB              2.99 GB
All archives:                6.74 GB              4.27 GB              2.99 GB
------------------------------------------------------------------------------
311.60user 68.28system 11:08.49elapsed 56%CPU (0avgtext+0avgdata 122824maxresident)k
15279400inputs+6788816outputs (0major+3258848minor)pagefaults 0swaps
anarcat@marcos:~$ time attic create --stats /mnt/attic-test::src-2014-11-18 ~/src/
------------------------------------------------------------------------------
Archive name: src-2014-11-18
Archive fingerprint: be840f1a49b1deb76aea1cb667d812511943cfb7fee67f0dddc57368bd61c4bf
Start time: Tue Nov 18 00:05:57 2014
End time: Tue Nov 18 00:06:35 2014
Duration: 38.15 seconds
Number of files: 283910
                       Original size      Compressed size    Deduplicated size
This archive:                6.74 GB              4.27 GB            116.63 kB
All archives:               13.47 GB              8.54 GB              3.00 GB
------------------------------------------------------------------------------
30.60user 4.66system 0:38.38elapsed 91%CPU (0avgtext+0avgdata 104688maxresident)k
18264inputs+258696outputs (0major+36892minor)pagefaults 0swaps
anarcat@marcos:~$ sudo apt-get install bup # this installed bup 0.25
anarcat@marcos:~$ free && sync && echo 3   sudo tee /proc/sys/vm/drop_caches && free # flush caches
anarcat@marcos:~$ export BUP_DIR=/mnt/bup-test
anarcat@marcos:~$ bup init
D p t Git vide initialis  dans /mnt/bup-test/
anarcat@marcos:~$ time bup index ~/src
Indexing: 345249, done.
56.57user 14.37system 1:45.29elapsed 67%CPU (0avgtext+0avgdata 85236maxresident)k
699920inputs+104624outputs (4major+25970minor)pagefaults 0swaps
anarcat@marcos:~$ time bup save -n src ~/src
Reading index: 345249, done.
bloom: creating from 1 file (200000 objects).
bloom: adding 1 file (200000 objects).
bloom: creating from 3 files (600000 objects).
Saving: 100.00% (6749592/6749592k, 345249/345249 files), done.
bloom: adding 1 file (126005 objects).
383.08user 61.37system 10:52.68elapsed 68%CPU (0avgtext+0avgdata 194256maxresident)k
14638104inputs+5944384outputs (50major+299868minor)pagefaults 0swaps
anarcat@marcos:attic$ time bup index ~/src
Indexing: 345249, done.
56.13user 13.08system 1:38.65elapsed 70%CPU (0avgtext+0avgdata 133848maxresident)k
806144inputs+104824outputs (137major+38463minor)pagefaults 0swaps
anarcat@marcos:attic$ time bup save -n src2 ~/src
Reading index: 1, done.
Saving: 100.00% (0/0k, 1/1 files), done.
bloom: adding 1 file (1 object).
0.22user 0.05system 0:00.66elapsed 42%CPU (0avgtext+0avgdata 17088maxresident)k
10088inputs+88outputs (39major+15194minor)pagefaults 0swaps

Disk usage is comparable:

anarcat@marcos:attic$ du -sc /mnt/*attic*
2943532K        /mnt/attic-test
2969544K        /mnt/bup-test

People are encouraged to try and reproduce those results, which should be fairly trivial.

Observations Here are interesting things I noted while working with both tools:

attic is Python3: i could compile it, with dependencies, by doing `apt-get build-dep attic` and running `setup.py` - i could also install it with `pip` if i needed to (but i didn't)

bup is Python 2, and has a scary makefile

both have an init command that basically does almost nothing and takes little enough time that i'm ignoring it in the benchmarks

attic backups are a single command, bup requires me to know that i first want to `index` and then `save`, which is a little confusing

bup has nice progress information, especially during save (because when it loaded the index, it knew how much was remaining) - just because of that, bup "feels" faster

bup, however, lets me know about its deep internals (like now i know it uses a bloom filter) which is probably barely understandable by most people

on the contrary, attic gives me useful information about the size of my backups, including the size of the current increment

it is not possible to get that information from bup, even after the fact - you need to `du` before and after the backup

attic modifies the files access times when backing up, while bup is more careful (there's a pull request to fix this in attic, which is how i found out about this)

both backup systems seem to produce roughly the same data size from the same input

Summary attic and bup are about equally fast. bup took 30 seconds less than attic to save the files, but that's not counting the 1m45s it took indexing them, so on the total run time, bup was actually slower. attic is also (almost) two times faster on the second run as well. but this could be within the margin of error of this very quick experiment, so my provisional verdict for now would be that they are about as fast. bup may be more robust (for example it doesn't modify the atimes), but this has not been extensively tested and is more based with my familiarity with the "conservatism" of the bup team rather than actual tests. considering all the features promised by attic, it makes for a really serious contender to the already amazing bup.

Next steps The properly do this, we would need to:

include other software (thinking of Zbackup, Burp, ddar, obnam, rdiff-backup and duplicity)

bench attic with the noatime patch

bench dev attic vs dev bup

bench data removal

bench encryption

test data recovery

run multiple backup runs, on different datasets, on a cleaner environment

ideally, extend seivot to do all of that

Note that the Burp author already did an impressive comparative benchmark of a bunch of those tools for the burp2 design paper, but it unfortunately doesn't include attic or clear ways to reproduce the results.

I first want to clarify for the people not intimately involved with Debian that the GR-2014-003 vote is not about choosing the default init system or deciding if sysvinit should still be supported: its outcome will not stop systemd from being Debian's default init system and will not prevent any interested developers from supporting sysvinit. Some non-developers have recently threatened of "forking Debian" if this GR will not pass, apparently without understanding well the concept: Debian welcomes forks and I think that having more users working on free software would be great no matter which init system they favour. The goal of Ian Jackson's proposal is to force the maintainers who want to use the superior features of systemd in their packages to spend their time on making them work with sysvinit as well. This is antisocial and also hard to reconcile it with the Debian Constitution, which states: 2.1.1 Nothing in this constitution imposes an obligation on anyone to do work for the Project. A person who does not want to do a task which has been delegated or assigned to them does not need to do it. [...] As it has been patiently explained by many other people, this proposal is unrealistic: if the maintainers of some packages were not interested in working on support for sysvinit and nobody else submitted patches then we would probably still have to release them as is even if formally declared unsuitable for a release. On the other hand, if somebody is interested in working on sysvinit support then there is no need for a GR forcing them to do it. The most elegant outcome of this GR would be a victory of choice 4 ("please do not waste everybody's time with pointless general resolutions"), but Ian Jackson has been clear enough in explaining how he sees the future of this debate: If my GR passes we will only have to have this conversation if those who are outvoted do not respect the project's collective decision. If my GR fails I expect a series of bitter rearguard battles over individual systemd dependencies. There are no significant practical differences between choices 2 "support alternative init systems as much as possible" and 3 "packages may require specific init systems if maintainers decide", but the second option is more explicit in supporting the technical decisions of maintainers and upstream developers. This is why I think that we need a stronger outcome to prevent discussing this over and over, no matter how each one of us feels about working personally on sysvinit support in the future. I will vote for choices 3, 2, 4, 1.

I published the slides of my talk "An introduction to peering in Italy - Interconnections among the Italian networks" that I presented today at the MIX-IT (the Milano internet exchange) technical meeting.

Exactly 15 years ago I uploaded to Debian the first release of my whois client. At the end of 1999 the United States Government forced Network Solutions, at the time the only registrar for the .com, .net and .org top level domains, to split their functions in a registry and a registrar and to and allow competing registrars to operate. Since then, two whois queries are needed to access the data for a domain in a TLD operating with a thin registry model: first one to the registry to find out which registrar was used to register the domain, and then one the registrar to actually get the data. Being as lazy as I am I tought that this was unacceptable, so I implemented a whois client that would know which whois server to query for all TLDs and then automatically follow the referrals to the registrars. But the initial reason for writing this program was to replace the simplistic BSD-derived whois client that was shipped with Debian with one that would know which server to query for IP addresses and autonomous system numbers, a useful feature in a time when people still used to manually report all their spam to the originating ISPs. Over the years I have spent countless hours searching for the right servers for the domains of far away countries (something that has often been incredibly instructive) and now the program database is usually more up to date than the official IANA one. One of my goals for this program has always been wide portability, so I am happy that over the years it was adopted by other Linux distributions, made available by third parties to all common variants of UNIX and even to systems as alien as Windows and OS/2. Now that whois is 15 years old I am happy to announce that I have recently achieved complete world domination and that all Linux distributions use it as their default whois client.

Very old Debian releases like woody (3.0), sarge (3.1), etch (4.0) and lenny (5.0) are not supported anymore by the Debian Security Team and do not get security updates. Since some of our customers still have servers running these version, I have built bash packages with the fix for CVE-2014-6271 (the "shellshock" bug) and Florian Weimer's patch which restricts the parsing of shell functions to specially named variables: http://ftp.linux.it/pub/People/md/bash/ This work has been sponsored by my employer Seeweb, an hosting, cloud infrastructure and colocation provider.

A little more than 2 years ago, the Ceilometer project was launched inside the OpenStack ecosystem. Its main objective was to measure OpenStack cloud platforms in order to provide data and mechanisms for functionalities such as billing, alarming or capacity planning. In this article, I would like to relate what I've been doing with other Ceilometer developers in the last 5 months. I've lowered my involvement in Ceilometer itself directly to concentrate on solving one of its biggest issue at the source, and I think it's largely time to take a break and talk about it. Ceilometer early design For the last years, Ceilometer didn't change in its core architecture. Without diving too much in all its parts, one of the early design decision was to build the metering around a data structure we called samples. A sample is generated each time Ceilometer measures something. It is composed of a few fields, such as the the resource id that is metered, the user and project id owning that resources, the meter name, the measured value, a timestamp and a few free-form metadata. Each time Ceilometer measures something, one of its components (an agent, a pollster ) constructs and emits a sample headed for the storage component that we call the collector. This collector is responsible for storing the samples into a database. The Ceilometer collector uses a pluggable storage system, meaning that you can pick any database system you prefer. Our original implementation has been based on MongoDB from the beginning, but we then added a SQL driver, and people contributed things such as HBase or DB2 support. The REST API exposed by Ceilometer allows to execute various reading requests on this data store. It can returns you the list of resources that have been measured for a particular project, or compute some statistics on metrics. Allowing such a large panel of possibilities and having such a flexible data structure allows to do a lot of different things with Ceilometer, as you can almost query the data in any mean you want. The scalability issue We soon started to encounter scalability issues in many of the read requests made via the REST API. A lot of the requests requires the data storage to do full scans of all the stored samples. Indeed, the fact that the API allows you to filter on any fields and also on the free-form metadata (meaning non indexed key/values tuples) has a terrible cost in terms of performance (as pointed before, the metadata are attached to each sample generated by Ceilometer and is stored as is). That basically means that the sample data structure is stored in most drivers in just one table or collection, in order to be able to scan them at once, and there's no good "perfect" sharding solution, making data storage scalability painful. It turns out that the Ceilometer REST API is unable to handle most of the requests in a timely manner as most operations are O(n) where n is the number of samples recorded (see big O notation if you're unfamiliar with it). That number of samples can grow very rapidly in an environment of thousands of metered nodes and with a data retention of several weeks. There is a few optimizations to make things smoother in general cases fortunately, but as soon as you run specific queries, the API gets barely usable. During this last year, as the Ceilometer PTL, I discovered these issues first hand since a lot of people were feeding me back with this kind of testimony. We engaged several blueprints to improve the situation, but it was soon clear to me that this was not going to be enough anyway.

Thinking outside the box Unfortunately, the PTL job doesn't leave him enough time to work on the actual code nor to play with anything new. I was coping with most of the project bureaucracy and I wasn't able to work on any good solution to tackle the issue at its root. Still, I had a few ideas that I wanted to try and as soon as I stepped down from the PTL role, I stopped working on Ceilometer itself to try something new and to think a bit outside the box. When one takes a look at what have been brought recently in Ceilometer, they can see the idea that Ceilometer actually needs to handle 2 types of data: events and metrics. Events are data generated when something happens: an instance start, a volume is attached, or an HTTP request is sent to an REST API server. These are events that Ceilometer needs to collect and store. Most OpenStack components are able to send such events using the notification system built into oslo.messaging. Metrics is what Ceilometer needs to store but that is not necessarily tied to an event. Think about an instance CPU usage, a router network bandwidth usage, the number of images that Glance is storing for you, etc These are not events, since nothing is happening. These are facts, states we need to meter. Computing statistics for billing or capacity planning requires both of these data sources, but they should be distinct. Based on that assumption, and the fact that Ceilometer was getting support for storing events, I started to focus on getting the metric part right. I had been a system administrator for a decade before jumping into OpenStack development, so I know a thing or two on how monitoring is done in this area, and what kind of technology operators rely on. I also know that there's still no silver bullet this made it a good challenge. The first thing that came to my mind was to use some kind of time-series database, and export its access via a REST API as we do in all OpenStack services. This should cover the metric storage pretty well. Cooking Gnocchi

A cloud of gnocchis! At the end of April 2014, this led met to start a new project code-named Gnocchi. For the record, the name was picked after confusing so many times the OpenStack Marconi project, reading OpenStack Macaroni instead. At least one OpenStack project should have a "pasta" name, right? The point of having a new project and not send patches on Ceilometer, was that first I had no clue if it was going to make something that would be any better, and second, being able to iterate more rapidly without being strongly coupled with the release process. The first prototype started around the following idea: what you want is to meter things. That means storing a list of tuples of (timestamp, value) for it. I've named these things "entities", as no assumption are made on what they are. An entity can represent the temperature in a room or the CPU usage of an instance. The service shouldn't care and should be agnostic in this regard. One feature that we discussed for several OpenStack summits in the Ceilometer sessions, was the idea of doing aggregation. Meaning, aggregating samples over a period of time to only store a smaller amount of them. These are things that time-series format such as the RRDtool have been doing for a long time on the fly, and I decided it was a good trail to follow. I assumed that this was going to be a requirement when storing metrics into Gnocchi. The user would need to provide what kind of archiving it would need: 1 second precision over a day, 1 hour precision over a year, or even both. The first driver written to achieve that and store those metrics inside Gnocchi was based on whisper. Whisper is the file format used to store metrics for the Graphite project. For the actual storage, the driver uses Swift, which has the advantages to be part of OpenStack and scalable. Storing metrics for each entities in a different whisper file and putting them in Swift turned out to have a fantastic algorithm complexity: it was O(1). Indeed, the complexity needed to store and retrieve metrics doesn't depends on the number of metrics you have nor on the number of things you are metering. Which is already a huge win compared to the current Ceilometer collector design. However, it turned out that whisper has a few limitations that I was unable to circumvent in any manner. I needed to patch it to remove a lot of its assumption about manipulating file, or that everything is relative to now (time.time()). I've started to hack on that in my own fork, but then everything broke. The whisper project code base is, well, not the state of the art, and have 0 unit test. I was starring at a huge effort to transform whisper into the time-series format I wanted, without being sure I wasn't going to break everything (remember, no test coverage). I decided to take a break and look into alternatives, and stumbled upon Pandas, a data manipulation and statistics library for Python. Turns out that Pandas support time-series natively, and that it could do a lot of the smart computation needed in Gnocchi. I built a new file format leveraging Pandas for computing the time-series and named it carbonara (a wink to both the Carbon project and pasta, how clever!). The code is quite small (a third of whisper's, 200 SLOC vs 600 SLOC), does not have many of the whisper limitations and it has test coverage. These Carbonara files are then, in the same fashion, stored into Swift containers. Anyway, Gnocchi storage driver system is designed in the same spirit that the rest of OpenStack and Ceilometer storage driver system. It's a plug-in system with an API, so anyone can write their own driver. Eoghan Glynn has already started to write a InfluxDB driver, working closely with the upstream developer of that database. Dina Belova started to write an OpenTSDB driver. This helps to make sure the API is designed directly in the right way. Handling resources Measuring individual entities is great and needed, but you also need to link them with resources. When measuring the temperature and the number of a people in a room, it is useful to link these 2 separate entities to a resource, in that case the room, and give a name to these relations, so one is able to identify what attribute of the resource is actually measured. It is also important to provide the possibility to store attributes on these resources, such as their owners, the time they started and ended their existence, etc.

Once this list of resource is collected, the next step is to list and filter them, based on any criteria. One might want to retrieve the list of resources created last week or the list of instances hosted on a particular node right now. Resources also need to be specialized. Some resources have attributes that must be stored in order for filtering to be useful. Think about an instance name or a router network. All of these requirements led to to the design of what's called the indexer. The indexer is responsible for indexing entities, resources, and link them together. The initial implementation is based on SQLAlchemy and should be pretty efficient. It's easy enough to index the most requested attributes (columns), and they are also correctly typed. We plan to establish a model for all known OpenStack resources (instances, volumes, networks, ) to store and index them into the Gnocchi indexer in order to request them in an efficient way from one place. The generic resource class can be used to handle generic resources that are not tied to OpenStack. It'd be up to the users to store extra attributes. Dropping the free form metadata we used to have in Ceilometer makes sure that querying the indexer is going to be efficient and scalable.

REST API All of this is exported via a REST API that was partially designed and documented in the Gnocchi specification in the Ceilometer repository; though the spec is not up-to-date yet. We plan to auto-generate the documentation from the code as we are currently doing in Ceilometer. The REST API is pretty easy to use, and you can use it to manipulate entities and resources, and request the information back.

Roadmap & Ceilometer integration All of this plan has been exposed and discussed with the Ceilometer team during the last OpenStack summit in Atlanta in May 2014, for the Juno release. I led a session about this entire concept, and convinced the team that using Gnocchi for our metric storage would be a good approach to solve the Ceilometer collector scalability issue. It was decided to conduct this project experiment in parallel of the current Ceilometer collector for the time being, and see where that would lead the project to. Early benchmarks Some engineers from Mirantis did a few benchmarks around Ceilometer and also against an early version of Gnocchi, and Dina Belova presented them to us during the mid-cycle sprint we organized in Paris in early July. The following graph sums up pretty well the current Ceilometer performance issue. The more you feed it with metrics, the more slow it becomes.

For Gnocchi, while the numbers themselves are not fantastic, what is interesting is that all the graphs below show that the performances are stable without correlation with the number of resources, entities or measures. This proves that, indeed, most of the code is built around a complexity of O(1), and not O(n) anymore.

Next steps

While the Juno cycle is being wrapped-up for most projects, including Ceilometer, Gnocchi development is still ongoing. Fortunately, the composite architecture of Ceilometer allows a lot of its features to be replaced by some other code dynamically. That, for example, enables Gnocchi to provides a Ceilometer dispatcher plugin for its collector, without having to ship the actual code in Ceilometer itself. That should help the development of Gnocchi to not be slowed down by the release process for now. The Ceilometer team aims to provide Gnocchi as a sort of technology preview with the Juno release, allowing it to be deployed along and plugged with Ceilometer. We'll discuss how to integrate it in the project in a more permanent and strong manner probably during the OpenStack Summit for Kilo that will take place next November in Paris.

It is no great secret that my colleagues at Collabora have been doing work with the Raspberry Pi Foundation.

My desk is very near Marco and I often see him working with the various Pi boards. Recently he obtained one of the new B+ units for testing and I thought it looked a little sad sat naked on his desk.

To remedy this bare board problem I designed and built a laser cut a case for him and now the B+ has been publicly released I can make the design freely available.

The design is completely original though is inspired by several other plastic "clip" type designs I have seen. Originally I created and debugged the case design for my parallella though tweaking it for the Pi was pretty easy.

The design is under a CC attribution licence and I ought to say that my employer is in no way responsible for this, its all my own fault.

This post explains how to configure on a Linux server a second and totally independent network interface with its own connectivity. It can be useful to access the server when the regular connectivity is broken. This can happen thanks to network namespaces, a virtualization feature available in recent kernels. We need to create a simple script to be run at boot time which will create and configure the namespace. First, move in the new namespace the network interface which will be dedicated to it:

ip netns add oob
ip link set eth2 netns oob

And then configure it as usual with iproute, by executing it in the new namespace with ip netns exec:

ip netns exec oob ip link set lo up
ip netns exec oob ip link set eth2 up
ip netns exec oob ip addr add 192.168.1.2/24 dev eth2
ip netns exec oob ip route add default via 192.168.1.1

The interface must be configured manually because ifupdown does not support namespaces yet, and it would use the same /run/network/ifstate file which tracks the interfaces of the main namespace (this is also a good argument in favour of something persistent like Network Manager...). Now we can start any daemon in the namespace, just make sure that they will not interfere with the on-disk state of other instances:

ip netns exec oob /usr/sbin/sshd -o PidFile=/run/sshd-oob.pid

Netfilter is virtualized as well, so we can load a firewall configuration which will be applied only to the new namespace:

ip netns exec oob iptables-restore < /etc/network/firewall-oob-v4

As documented in ip-netns(8), iproute netns add will also create a mount namespace and bind mount in it the files in /etc/netns/$NAMESPACE/: this is very useful since some details of the configuration, like the name server IP, will be different in the new namespace:

mkdir -p /etc/netns/oob/
echo 'nameserver 8.8.8.8' > /etc/netns/oob/resolv.conf

If we connect to the second SSH daemon, it will create a shell in the second namespace. To enter the main one, i.e. the one used by PID 1, we can use a simple script like:

#!/bin/sh -e
exec nsenter --net --mount --target 1 "$@"

To reach the out of band namespace from the main one we can use instead:

#!/bin/sh -e
exec nsenter --net --mount --target $(cat /var/run/sshd-oob.pid) "$@"

Scripts like these can also be used in fun ssh configurations like:

Host 10.2.1.*
 ProxyCommand ssh -q -a -x -N -T server-oob.example.net 'nsenter-main nc %h %p'

When I am at home I do not want to be bothered by the screensaver locking my laptop. To solve this, I use a custom PAM configuration which checks if I am authenticated to the local access point. Add this to the top of /etc/pam.d/xscreensaver:

auth sufficient pam_exec.so quiet /usr/local/sbin/pam_auth_xscreensaver

And then use a script like this one to decide when you want the display to be automatically unlocked:

#!/bin/sh -e
# return the ESSID of this interface
current_essid()  
  /sbin/iwconfig $1   sed -nre '/ESSID/s/.*ESSID:"([^"]+)".*/\1/p'
 
# automatically unlock only for these users
case "$PAM_USER" in
  "")   echo "This program must be run by pam_auth.so!"
        exit 1
        ;;
  md)   ;;
  *)    exit 1
        ;;
esac
CURRENT_ESSID=$(current_essid wlan0)
# automatically unlock when connected to these networks
case "$CURRENT_ESSID" in
  MYOWNESSID) exit 0 ;;
esac
exit 6

Search Results: "marco"

1 June 2016

Antoine Beaupr : (Still) working too much on the computer

14 May 2016

Antoine Beaupr : Long delays posting Debian Planet Venus

12 May 2016

Antoine Beaupr : Notmuch, offlineimap and Sieve setup

4 January 2016

Alessio Treglia: Enterprise Innovation in a Transformative Society

Lunar: Reproducible builds: week 36 in Stretch cycle

3 January 2016

Lunar: Reproducible builds: week 35 in Stretch cycle

27 October 2015

Marco d'Itri: Per-process netfilter rules

9 October 2015

Antoine Beaupr : Finding Debian release history with etckeeper

16 August 2015

Ben Armstrong: McIntosh Run Hike the Greenbelt event, August 2015

9 August 2015

Simon Kainz: DUCK challenge: week 5

15 December 2014

Gustavo Noronha Silva: Web Engines Hackfest 2014

18 November 2014

Antoine Beaupr : bup vs attic silly benchmark

4 November 2014

Marco d'Itri: My position on the "init system coupling" General Resolution

14 October 2014

Marco d'Itri: The Italian peering ecosystem

3 October 2014

Marco d'Itri: 15 years of whois

29 September 2014

Marco d'Itri: CVE-2014-6271 fix for Debian woody, sarge, etch and lenny

18 August 2014

Julien Danjou: OpenStack Ceilometer and the Gnocchi experiment

16 July 2014

Vincent Sanders

1 April 2014

Marco d'Itri: Real out of band connectivity with network namespaces

20 February 2014

Marco d'Itri: Automatically unlocking xscreensaver in some locations